Canh Quang TRAN Hiroshi KAWAGUCHI Takayasu SAKURAI
A low-power FPGA design approach is proposed based on a fine-grain VDD control scheme called micro-VDD-hopping. Four configurable logic blocks (CLBs) are grouped into one block where VDD is shared. In the micro-VDD-hopping scheme, VDD in each block is changed between VDDH (high VDD) and VDDL (low VDD) spatially and temporally in order to achieve lower power without performance degraded. A low-power level shifter that has less contention is also proposed for low-swing inter-block signals. The FPGA incorporates the Zigzag power-gating scheme, in which special care has been taken to cope with a sneak leakage-path problem. A test chip was fabricated using a 0.35-µm CMOS technology, together with the conventional fixed-VDD FPGA for comparison. Measurement results show that dynamic power in the proposed scheme can be reduced by 86% when a frequency is half of the maximum one. Simulation using a 90-nm CMOS technology shows that leakage power can be reduced by 97%, when the proposed method is used. The area overhead of the proposed FPGA is 2%.
Koichiro ISHIBASHI Tetsuya FUJIMOTO Takahiro YAMASHITA Hiroyuki OKADA Yukio ARIMA Yasuyuki HASHIMOTO Kohji SAKATA Isao MINEMATSU Yasuo ITOH Haruki TODA Motoi ICHIHASHI Yoshihide KOMATSU Masato HAGIWARA Toshiro TSUKADA
Circuit techniques for realizing low-voltage and low-power SoCs for 90-nm CMOS technology and beyond are described. A proposed SAFBB (self-adjusted forward body bias techniques), ATC (Asymmetric Three transistor Cell) DRAM, and ADC using an offset canceling comparator deal with leakage and variability issues for these technologies. A 32-bit adder using SAFBB attained 353-µA at 400-MHz operation at 0.5-V supply voltage, and 1 Mb memory array using ATC DRAM cells achieved 1.5 mA at 50 MHz, 0.5 V. The 4-bit ADC attained 2 Gsample/s operation at a supply voltage of 0.9 V.
A simple low power low phase noise LC QVCO (Quadrature Voltage Controlled Oscillator) topology is proposed. The topology minimizes phase noise by eliminating the contributions from the tail current source and coupling transistors. With no more than 3.36 mW power consumption from a 1.2 V power supply, the VCO achieves -124 dBc/Hz phase noise performance at 1 MHz offset from the 2.85 GHz carrier frequency.
We propose and implement a four-phase handshake protocol for bundled-data asynchronous circuits with consideration given to power consumption and area. A key aspect is that our protocol uses three phases for generating the matched delay to signal the completion of the data-path stage operation whereas conventional methods use only one phase. A comparison with other protocols at 0.18 µm process showed that our protocol realized lower power consumption than any other protocol at cycle times of 1.2 ns or more. The area of the delay generator required for a given data-path delay was less than half that of other protocols. The overhead of the timing generator was the same as or less than that of other protocols.
This paper proposes a novel cache architecture for low power consumption, called "Adaptive Way-Predicting Cache (AWP cache)." The AWP cache has multi-operation modes and dynamically adapts the operation mode based on the accuracy of way-prediction results. A confidence counter for way prediction is implemented to each cache set. In order to analyze the effectiveness of the AWP cache, we perform a SRAM design using 0.18 µm CMOS technology and cycle-accurate processor simulations. As the results, for a benchmark program (179.art), it is observed that a performance-aware AWP cache reduces the 49% of performance overhead caused by an original way-predicting cache to 17%. Furthermore, a energy-aware AWP cache achieves 73% of energy reduction, whereas that obtained from the original way-predicting scheme is only 38%, compared to an non-optimized conventional cache. For the consideration of energy-performance efficiency, we see that the energy-aware AWP cache produces better results; the energy-delay product of conventional organization is reduced to only 35% in average which is 6% better than the original way-predicting scheme.
Chun-Lung HSU Wen-Tso WANG Ying-Fu HONG
This work presents a frequency-scaling low-power (FSLP) design methodology for managing power consumption of cores in the tile-based network-on-chip (NOC) architecture. A moving picture experts group (MPEG) core is tested using the field-programmable gate array (FPGA) implementation to verify the feasibility of the proposed method. Measurement results show that about 30% power consumption can be saved in the MPEG core and reveal that the proposed FSLP design method can be suitable for cores in the tile-based NOC applications.
Kentaro KAWAKAMI Miwako KANAMORI Yasuhiro MORITA Jun TAKEMURA Masayuki MIYAMA Masahiko YOSHIMOTO
To achieve both of a high peak performance and low average power characteristics, frequency-voltage cooperative control processor has been proposed. The processor schedules its operating frequency according to the required computation power. Its operating voltage or body bias voltage is adequately modulated simultaneously to effectively cut down either switching current or leakage current, and it results in reduction of total power dissipation of the processor. Since a frequency-voltage cooperative control processor has two or more operating frequencies, there are countless scheduling methods exist to realize a certain number of cycles by deadline time. This proposition is frequently appears in a hard real-time system. This paper proves two important theorems, which give the power-minimum frequency scheduling method for any types of frequency-voltage cooperative control processor, such as Vdd-control type, Vth-control type and Vdd-Vth-control type processors.
Yuichiro MURACHI Koji HAMANO Tetsuro MATSUNO Junichi MIYAKOSHI Masayuki MIYAMA Masahiko YOSHIMOTO
This paper describes a 95 mW MPEG2 MP@HL motion estimation processor core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 19201080@30 fps resolution video. The search range is 12864 pixels. The ME core integrates 2.25 M transistors in 3.1 mm3.1 mm using 0.18-micron technology.
Satoshi KOMATSU Masahiro FUJITA
Energy consumption is one of the most critical constraints in the current VLSI system designs. In addition, fault tolerance of VLSI systems will be also one of the most important requirements in the future shrunk VLSIs. This paper proposes practical low power and fault tolerant bus encoding methods in on-chip data transfer. The proposed encoding methods use the combination of simple low power code and fault tolerant code. Experimental results show that the proposed methods can reduce signal transitions by 23% on the bus with fault tolerance. In addition, circuit implementation results with bus signal swing optimization show the effectiveness of the proposed encoding methods. We show also the selection methodology of the optimum encoding method under the given requirements.
Chi-Chia SUNG Shanq-Jang RUAN Bo-Yao LIN Mon-Chau SHIE
In recent years, the demand for multimedia mobile battery-operated devices has created a need for low power implementation of video compression. Many compression standards require the discrete cosine transform (DCT) function to perform image/video compression. For this reason, low power DCT design has become more and more important in today's image/video processing. This paper presents a new power-efficient Hybrid DCT architecture which combines Loeffler DCT and binDCT in terms of special property on luminance and chrominance difference. We use Synopsys PrimePower to estimate the power consumption in a TSMC 0.25-µm technology. Besides, we also adopt a novel quality assessment method based on structural distortion measurement to measure the quality instead of peak signal to noise rations (PSNR) and mean squared error (MSE). It is concluded that our Hybrid DCT offers similar quality performance to the Loeffler, and leads to 25% power consumption and 27% chip area savings.
Mei-Fen CHOU Wen-Shen WUEN Chang-Ching WU Kuei-Ann WEN Chun-Yen CHANG
A CMOS low noise amplifier (LNA) for low-power ultra-wideband (UWB) wireless applications is presented. To achieve low power consumption and wide operating bandwidth, the proposed LNA employing stagger tuning technique consists of two stacked common-source stages with different resonant frequencies. This work is implemented in 0.18-µm CMOS process and shows a 2.4-9.4-GHz bandwidth. The amplifier provides a maximum forward gain (S21) of 10.9 dB while drawing 7.1 mW from a 1.8-V supply. A noise figure as low as 4.1 dB and an IIP3 of -3.5 dBm have been demonstrated.
Kicheol KIM Dongsub SONG Incheol KIM Sungho KANG
A new low power test pattern generator (TPG) which can effectively reduce the average power consumption during test application is developed. The new TPG reduces the weighted switching activity (WSA) of the circuit under test (CUT) by suppressing transitions at some primary inputs which make many transitions. Moreover, the new TPG does not lose fault coverage. Experimental results on the ISCAS benchmark circuits show that average power reduction can be achieved up to 33.8% while achieving high fault coverage.
Zhiqiang YOU Ken'ichi YAMAGUCHI Michiko INOUE Jacob SAVIR Hideo FUJIWARA
This paper proposes two power-constrained test synthesis schemes and scheduling algorithms, under non-scan BIST, for RTL data paths. The first scheme uses boundary non-scan BIST, and can achieve low hardware overheads. The second scheme uses generic non-scan BIST, and can offer some tradeoffs between hardware overhead, test application time and power dissipation. A designer can easily select an appropriate design parameter based on the desired tradeoff. Experimental results confirm the good performance and practicality of our new approaches.
Jianping HU Tiefeng XU Hong LI
This paper presents a novel low-power register file based on adiabatic logic. The register file consists of a storage-cell array, address decoders, read/write control circuits, sense amplifiers, and read/write drivers. The storage-cell array is based on the conventional memory cell. All the circuits except the storage-cell array employ CPAL (complementary pass-transistor adiabatic logic) to recover the charge of large node capacitance on address decoders, bit-lines and word-lines in fully adiabatic manner. The minimization of energy consumption was investigated by choosing the optimal size of CPAL circuits for large load capacitance. The power consumption of the proposed adiabatic register file is significantly reduced because the energy transferred to the large capacitance buses is mostly recovered. The energy and functional simulations are performed using the net-list extracted from the layout. HSPICE simulation results indicate that the proposed register file attains energy savings of 65% to 85% as compared to the conventional CMOS implementation for clock rates ranging from 25 to 200 MHz.
Mohammed A. ELGAMEL Md Ibrahim FAISAL Magdy A. BAYOUMI
About 20-45% of the total power in any VLSI circuit is consumed by the clocking system and 90% of this power consumption is spent by flip-flops. Wider datapaths, deeper pipelines, and increasing number of registers in modern processors have underscored the importance of the flip-flops. As a result, the flip-flops' performance metrics such as, power, delay, and power delay product will become a crucial factor in overall performance of processors. As technology is moving into deep submicron level, noise immunity and noise generated by any component in a digital device is also becoming a vital factor in circuit design. This paper studies various flip-flop designs for their noise immunity and noise generation metrics. It categorizes the flip-flops and reports extensive simulation results for best representative examples including the newly proposed one from the group (a patent is filed for this flip-flop). It compares power, delay, power delay product, number of transistors, number of clocked transistors, noise immunity, and noise generation for flip-flops that are reported as ones with the best performances in the literature.
Tetsuya HIROSE Toshimasa MATSUOKA Kenji TANIGUCHI Tetsuya ASAI Yoshihito AMEMIYA
An ultralow power constant reference current circuit with low temperature dependence for micropower electronic applications is proposed in this paper. This circuit consists of a constant-current subcircuit and a bias-voltage subcircuits, and it compensates for the temperature characteristics of mobility µ, thermal voltage VT, and threshold voltage VTH in such a way that the reference current has small temperature dependence. A SPICE simulation demonstrated that reference current and total power dissipation is 97.7 nA, 1.1 µW, respectively, and the variation in the reference current can be kept very small within 4% in a temperature range from -20 to 100.
Toshihiro MATSUDA Ryuichi MINAMI Akira KANAMORI Hideyuki IWATA Takashi OHZONE Shinya YAMAMOTO Takashi IHARA Shigeki NAKAJIMA
A pure CMOS threshold-voltage reference (VTR) circuit achieves temperature (T) coefficient of 5 µV/(T = -60+100) and supply voltage (VDD) sensitivity of 0.1 mV/V (VDD = 35 V). A combination of subthreshold current, linear current and saturation current in n-MOSFETs provides a small voltage and temperature dependence. Three different regions in I-V characteristics of MOSFETs generate a constant VTR based on threshold voltage at 0 K. A feedback scheme from the reference output to gates of n-MOSFETs extremely stabilizes the output. The circuit consists of only 17 MOSFETs and its simple scheme saves the die area, which is 0.18 mm2 in the TEG (Test Element Group) chip fabricated by 1.2 µm n-well CMOS process.
Reiko KOMIYA Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
As the transistor feature sizes and threshold voltages reduce, leakage energy consumption has become an inevitable issue for high-performance microprocessor designs. Since on-chip caches are major contributors of the leakage, a number of researchers have proposed efficient leakage reduction techniques. However, it is still not clear that 1) what kind of algorithm can be considered and 2) how much they have impact on energy and performance. To answer these questions, we explore run-time cache management algorithm, and evaluate the energy-performance efficiency for several alternatives.
Norihito SUZUKI Takahide KADOYAMA Masayuki KATAKURA
A GPS radio design for a complete single chip GPS receiver using 0.18-µm CMOS is presented. The complete single chip GPS receiver satisfies several key requirements for mobile applications, such as compactness, low power, and high sensitivity. The radio part, including the RF front end, the RF/IF PLLs, and IF functions, occupies 2.0 2.3 mm in a total chip area of 6.3 6.3 mm. It is fabricated using 0.18-µm CMOS technology utilizing MIM capacitors. The radio part operates within a 1.6 to 2.0 V supply voltage range and consumes 27 mW at 1.8 V. The whole GPS SoC consumes 57 mW for a fully functional chip and provides a high sensitivity of -152 dBm. The radio design features countermeasures against substrate coupling noise from the digital part.
Toshinori SATO Akihiro CHIYONOBU
Power consumption is a major concern in embedded microprocessors design. Reducing power has also been a critical design goal for general-purpose microprocessors. Since they require high performance as well as low power, power reduction at the cost of performance cannot be accepted. There are a lot of device-level techniques that reduce power with maintaining performance. They select non-critical paths as candidates for low-power design, and performance-oriented design is used only in speed-critical paths. The same philosophy can be applied to architectural-level design. We evaluate a technique, which exploits dynamic information regarding instruction criticality in order to reduce power. We evaluate an instruction steering policy for a clustered microarchitecture, which is based on instruction criticality, and find it is substantially energy-efficient while it suffers performance degradation.